ADWISE: Adaptive Window-based Streaming Edge Partitioning for High-Speed Graph Processing
نویسندگان
چکیده
In recent years, the graph partitioning problem gained importance as a mandatory preprocessing step for distributed graph processing on very large graphs. Existing graph partitioning algorithms minimize partitioning latency by assigning individual graph edges to partitions in a streaming manner — at the cost of reduced partitioning quality. However, we argue that the mere minimization of partitioning latency is not the optimal design choice in terms of minimizing total graph analysis latency, i.e., the sum of partitioning and processing latency. Instead, for complex and long-running graph processing algorithms that run on very large graphs, it is beneficial to invest more time into graph partitioning to reach a higher partitioning quality — which drastically reduces graph processing latency. In this paper, we propose ADWISE, a novel window-based streaming partitioning algorithm that increases the partitioning quality by always choosing the best edge from a set of edges for assignment to a partition. In doing so, ADWISE controls the partitioning latency by adapting the window size dynamically at run-time. Our evaluations show that ADWISE can reach the sweet spot between graph partitioning latency and graph processing latency, reducing the total latency of partitioning plus processing by up to 23− 47 percent compared to the state-of-the-art. Keywords-Graph partitioning; vertex-cut; edge partitioning; adaptive; window; streaming; distributed graph processing
منابع مشابه
Workload-aware Streaming Graph Partitioning
Partitioning large graphs, in order to balance storage and processing costs across multiple physical machines, is becoming increasingly necessary as the typical scale of graph data continues to increase. A partitioning, however, may introduce query processing latency due to inter-partition communication overhead, especially if the query workload exhibits skew, frequently traversing a limited su...
متن کاملDesign and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملReally Big Data: Analytics on Graphs with Trillions of Edges (Keynote Abstract)
Big graphs occur naturally in many applications, most obviously in social networks, but also in many other areas such as biology and forensics. Current approaches to processing large graphs use either supercomputers or very large clusters. In both cases the entire graph must reside in memory before it can be processed. We are pursuing an alternative approach, processing graphs from secondary st...
متن کاملUsing an Evaluator Fixed Structure Learning Automata in Sampling of Social Networks
Social networks are streaming, diverse and include a wide range of edges so that continuously evolves over time and formed by the activities among users (such as tweets, emails, etc.), where each activity among its users, adds an edge to the network graph. Despite their popularities, the dynamicity and large size of most social networks make it difficult or impossible to study the entire networ...
متن کاملModeling, Analysis, and Experimental Comparison of Streaming Graph-Partitioning Policies: A Technical Report
In recent years, many distributed graph-processing systems have been designed and developed to analyze large-scale graphs. For all distributed graph-processing systems, partitioning graphs is a key part of processing and an important aspect of achieve good processing performance. To keep low the performance of partitioning graphs, even when processing the ever-increasing modern graphs, many pre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1712.08367 شماره
صفحات -
تاریخ انتشار 2017